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Abstract — Sequence optimization^ where the items in a list are 
ordered to maximize some reward has many applications such 
as web advertisement placement, search, and control libraries 
in robotics. Previous work in sequence optimization produces a 
static ordering that does not take any features of the item or 
context of the problem into account. In this work, we propose a 
general approach to order the items within the sequence based 
on the context {e.g., perceptual information, environment descrip- 
tion, and goals). We take a simple, efficient, reduction-based 
approach where the choice and order of the items is established 
by repeatedly learning simple classifiers or regressors for each 
"slot" in the sequence. Our approach leverages recent work on 
submodular function maximization to provide a formal regret 
reduction from submodular sequence optimization to simple cost- 
sensitive prediction. We apply our contextual sequence prediction 
algorithm to optimize control libraries and demonstrate results 
on two robotics problems: manipulator trajectory prediction and 
mobile robot path planning. 

I. Introduction 

Optimizing the order of a set of choices is fundamental to 
many problems such as web search, advertisement placements 
as well as in robotics and control. Relevance and diversity are 
important properties of an optimal ordering or sequence. In 
web search, for instance, if the search term admits many differ- 
ent interpretations then the results should be interleaved with 
items from each interpretation 1 16 |. Similarly in advertisement 
placement on web pages, advertisements should be chosen 
such that within the limited screen real estate they are diverse 
yet relevant to the page content. In robotics, control libraries 
have the same requirements for relevance and diversity in the 
ordering of member actions. In this paper, we apply sequence 
optimization to develop near-optimal control libraries. In the 
context of control libraries, a sequence refers to a ranked 
list of control action choices rather than a series of actions 
to be taken. Examples of control actions include grasps for 
manipulation, trajectories for mobile robot navigation or seed 
trajectories for initializing a local trajectory optimizer. 

Control libraries are a collection of control actions obtained 
by sampling a useful set of often high dimensional control 
trajectories or policies. Examples of control libraries include a 
collection of feasible grasps for manipulation |5 |, a collection 
of feasible trajectories for mobile robot navigation ||9l, and 
a collection of expert-demonstrated trajectories for a walking 
robot (Stolle et. al. |20|). Similarly, recording demonstrated 
trajectories of experts aggressively flying unmanned aerial 
vehicles (UAVs) has enabled dynamically feasible trajectories 



to be quickly generated by concatenating a suitable subset of 
stored trajectories in the control library fS]. 

Such libraries are an effective means of spanning the space 
of all feasible control actions while at the same time dealing 
with computational constraints. The performance of control 
libraries on the specified task can be significantly improved by 
careful consideration of the content and order of actions in the 
library. To make this clear let us consider specific examples: 

Mobile robot navigation. In mobile robot navigation the 
task is to find a collision-free, low cost of traversal path 
which leads to the specified goal on a map. Since sensor 
horizons are finite and robots usually have constrained motion 
models and non-trivial dynamics, a library of trajectories 
respecting the dynamic and kinematic constraints of the robot 
are precomputed and stored in memory. This constitutes the 
control library. It is desired to sample a subset of trajectories 
at every time step so that the overall cost of traversal of the 
robot from start to goal is minimized. 

Trajectory optimization. Local trajectory optimization 
techniques are sensitive to initial trajectory seeds. Bad trajec- 
tory initializations may lead to slow optimization, suboptimal 
performance, or even remain in collision. In this setting, 
the control actions are end-to-end trajectory seeds that act 
as input to the optimization. Zucker |27| and Jetchev et 
al. fTTIl proposed methods for predicting trajectories from a 
precomputed library using features of the environment, yet 
these methods do not provide recovery methods when the 
prediction fails. Having a sequence of initial trajectory seeds 
provides fallbacks should earlier ones fail. 

Grasp libraries. During selection of grasps for an object, 
a library of feasible grasps can be evaluated one at a time 
until a collision-free, reachable grasp is found. While a naive 
ordering of grasps can be based on force closure and stability 
criteria O, if a grasp fails, then grasps similar to it are also 
likely to fail. A more principled ordering approach which takes 
into account features of the environment can reduce depth of 
the sequence that needs to be searched by having diversity in 
higher ranked grasps. 

Current state-of-the-art methods in the problems we address 
either predict only a single control action in the library that 
has the highest score for the current environment, or use an 
ad-hoc ordering of actions such as random order or by past 
rate of success. If the predicted action fails then systems 
(e.g. manipulators and autonomous vehicles) are unable to 



recover or have to fall back on some heuristic/hard-coded 
contingency plan. Predicting a sequence of options to evaluate 
is necessary for having intelligent, robust behavior. Choosing 
the order of evaluation of the actions based on the context of 
the environment leads to more efficient performance. 

A naive way of predicting contextual sequences would be 
to train a multi-class classifier over the label space consisting 
of all possible sequences of a certain length. This space is 
exponential in the number of classes and sequence length 
posing information theoretic difficulties. A more reasonable 
method would be to use the greedy selection technique by 
Steeter et al. |21| over the hypothesis space of all predictors 
which is guaranteed to yield sequences within a constant 
factor of the optimal sequence. Implemented naively, this 
remains expensive as it must explicitly enumerate the label 
space. Our simple reduction based approach where we propose 
to train multiple multi-class classifiers/regressors to mimic 
greedy selection given features of the environment is both 
efficient and maintains performance guarantees of the greedy 
selection. 

Perception modules using sensors such as cameras and li- 
dars are part and parcel of modern robotic systems. Leveraging 
such information in addition to the feedback of success or 
failure is conceptually straightforward: instead of considering 
a sequence of control actions, we consider a sequence of 
classifiers which map features X to control actions A, and 
attempt to find the best such classifier at each slot in the 
control action sequence. By using contextual features, our 
method has the benefit of closing the loop with perception 
while maintaining the performance guarantees in Streeter et 
al.EB. 

The outlined examples present loss functions that depend 
only on the "best" action in the sequence, or attempt to 
minimize the prediction depth to find a satisfactory action. 
Such loss functions are monotone, submodular - i.e., one 
with diminishing returns]^ We define these functions in section 
nil and review the online submodular function maximization 
approach of Streeter et al. |21|. We also describe our contex- 
tual sequence optimization (ConSeqOpt) algorithm in detail. 



Section III shows our algorithm's performance improvement 
over alternatives for local trajectory optimization for manipu- 
lation and in path planning for mobile robots. 
Our contributions in this work are: 

• We propose a simple, near-optimal reduction for contex- 
tual sequence optimization. Our approach moves from 
predicting a single decision based on features to making 
a sequence of predictions, a problem that arises in many 
domains including advertisement prediction 122, J6J and 
search. 

• The application of this technique to the contextual opti- 
mization of control libraries. We demonstrate the efficacy 
of the approach on two important problems: robot ma- 
nipulation planning and mobile robot navigation. Using 

^ For more information on submodularity and optimization of submodular 
functions we refer readers to the tutorial |4|. 



the sequence of actions generated by our approach we 
observe improvement in performance over sequences 
generated by either random ordering or decreasing rate 
of success of the actions. 

• Our algorithm is generic and can be naturally applied 
to any problem where ordered sequences (e.g., advertise- 
ment placement, search, recommendation systems, etc) 
need to be predicted and relevance and diversity are 
important. 

II. Contextual Optimization of Sequences 

A. Background 

The control library is a set Y of actions. Each action is 
denoted by a e Formally, a function f : y ^ 91+ is 

monotone submodular for any sequence S e ^ where y is 
the set of all sequences of actions if it satisfies the following 
two properties: 

• (Monoticity) for any sequence Si^S2^ f{Si ) < f{Si 
^2) and /(^2)</(^i©^2) 

• (Submodularity) for any sequence 51,52 G f{Si) and 
any action a G r, f{Si ©^2 © {a))-f{Si ©^2) < f {Si © 
{a))-f{Si) 

where © denotes order dependent concatenation of sequences. 
These imply that the function always increases as more actions 
are added to the sequence (monotonicity) but the gain obtained 
by adding an action to a larger pre-existing sequence is less 
as compared to addition to a smaller pre-existing sequence 
(sub-modularity). 

For control library optimization, we attempt to optimize one 
of two possible criteria: the cost of the best action a in a 
sequence (with a budget on sequence size) or the time (depth 
in sequence) to find a satisficing action. For the former, we 
consider the function, 

A^^ - min(cost(ai),cost(a2), > - . ,cost(ajv)) 

where cost is an arbitrary cost on an action (ai) given an 
environment and A^^* is a constant, positive normalizer which 
is the highest cost. |j Note that the / takes in as arguments the 
sequence of actions ai , ^2, . . . , <2iv directly, but is also implicitly 
dependent on the current environment on which the actions are 
evaluated in cost {at). Dey et al. [6| prove that this criterion is 
monotone submodular in sequences of control actions and can 
be maximized- within a constant factor- by greedy approaches 
similar to Streeter et al. |21|. 

For the latter optimization criteria, which arises in grasping 
and trajectory seed selectin, we define the monotone, sub- 
modular loss function / : ^ ^ [0, 1] as / = ^(5) where 
P{S) is the probability of successfully grasping an object in 

^In this work we assume that each action choice takes the same time to 
execute although the proposed approach can be readily extended to handle 
different execution times. 

^For mobile robot path planning, for instance, cost{ai) is typically a 
simple measure of mobility penalty based on terrain for traversing a trajectory 
Ui sampled from a set of trajectories and terminating in a heuristic cost-to-go 
estimate, compute by, e.g. A*. 



a given scenario using the sequence of grasps provided. It 
is easy to check |6 | that this function is also monotone and 
submodular, as the probabiHty of success always increases as 
we consider additional elements. Minimizing the depth in the 
control library to be evaluated becomes our goal. In the rest of 
the paper all objective functions are assumed to be monotone 
submodular unless noted otherwise. 

While optimizing these over library actions is effective, the 
ordering of actions does not take into account the current 
context. People do not attempt to grasp objects based only 
on previous performance of grasps: they take into account 
the position, orientation of the object, the proximity and 
arrangement of clutter around the object and also their own 
position relative to the object in the current environment. 

B. Our Approach 

We consider functions that are submodular over sequences 
of either control actions themselves or, crucially, over clas- 
sifiers that take as input environment features X and map 
to control actions Additionally, by considering many 
environments, the expectation of / in equation ([T]) over these 
environments also maintains these properties. In our work, we 
always consider the expected loss averaged over a (typically 
empirical) distribution of environments. 

In Algorithm [T] we present a simple approach for learning 
such a near-optimal contextual control library. 

C. Algorithm for Contextual Submodular Sequence Optimiza- 
tion 

Figure [T] shows the schematic diagram for algorithm [T] which 
trains a classifier for each slot of the sequence. Define matrix 
X to be the set of features from a distribution of example 
environments (one feature vector per row) and matrix Y to be 
the corresponding target action identifier for each example. 
Let each feature vector contain L attributes. Let D be the 
set of example environments containing \D\ examples. The 
size of X is \D\ x L and size of Y is |Z)| x L We denote the 
classifier by Tli. Define to be the matrix of marginal 
losses for each environment for the slot of the sequence. 
In the parlance of cost- sensitive learning Mlj is the example- 
dependent cost matrix. Mlj is of dimension \D\ x Each 
row of Mlj contains, for the corresponding environment, the 
loss suffered by the classifier for selecting a particular action 
a e y. The most beneficial action has loss while others 
have non-zero losses. These losses are normalized to be within 
[0—1]. We detail how to calculate the entries of Mlj below. 
Classifier inputs are the set of feature vectors X for the dataset 
of environments and the marginal loss matrix Mlj. 

For ease of understanding let us walk through the training 
of the first two classifiers TTi and 7l2- 

Consider the first classifier training in Figure [T] and its inputs 
X and . Consider the first row of Ml^ . Each element of 
this row corresponds to the loss incurred if the corresponding 
action in Y were taken on the corresponding environment 
whose features are in the first row of X. The best action has 
loss while all the others have relative losses in the range [0—1] 
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Fig. 1: Schematic of training sequence of classifiers for regret 
reduction of contextual sequence optimization to multi-class, 
cost-sensitive, classification 



depending on how much worse they are compared to the best 
action. This way the rest of the rows in Mlj are filled out. 
The cost sensitive classifier n\ is trained. The set of features X 
from each environment in the training set are again presented 
to it to classify. The output is matrix Y;^ which contains the 
selected action for the slot for each environment. As no 
element of the hypothesis class performs perfectly this results 
in \nx^ where not every environment had the loss action 
picked. 

Consider the second classifier training in Figure [T] Consider 
the first row of Ml2. Suppose control action id 13 was 
selected by classifier n\ in the classification step for the first 
environment, which provides a gain of 0.6 to the objective 
function / i.e. /[13] = 0.6. For each of the control actions a 
present in the library ^ find the action which provides maxi- 
mum marginal improvement i.e. amax = argmaxa{f {[13, a]) — 



Algorithm 1 Algorithm for training ConSeqOpt using classifiers 



Input: sequence length N, multi-class cost sensitive classifier routine 7t, dataset D of 

\D\ number of environments and associated features X, library of control actions Y 
Output: sequence of classifiers ;ri, ;r2, . . . , ttn 
1: for / = 1 to A/^ do 

2: Ml; ^ comput eTarget Act ions (X,Y;i;j^;7r2,... ,7ri_i, 
3: TTi ^ train(X,MLj) 
4: Y;i. ^ classify(X) 
5: end for 



Algorithm 2 Algorithm for training ConSeqOpt using regressors 

Input: sequence length N, regression routine 91, dataset D of \D\ number of 

environments, library of control actions Y 
Output: sequence of regressors ^\^'^2-, • 
1: for / = 1 to A/^ do 

2: Xi,MBi ^ computeFeatures&Benef it (Z), Y9^j^9^2,...,9ti_i7 ^) 
3: ^ train(Xi,MBj) 

4: Mbj ^ regress(Xi,9t;) 
5: ¥9^. = argmax(MBi) 
6: end for 



/([13])) = argmaxa{f{[l?>^a]) —0.6. Additionally convert the 
marginal gains computed for each a in the library to propor- 
tional losses and store in the first row of • If cimax is the 
action with the maximum marginal gain then the loss for each 
of the other actions is /([13,a^Qj) — a^ax has 
loss while other actions have >= loss. The rest of the rows 
are filled up similarly. 712 is trained, and evaluated on same 
dataset to produce Yjj^^. 

This procedure is repeated for all N slots producing a se- 
quence of classifiers tti, ;r2, . . . , ttn- The idea is that a classifier 
must suffer a high loss when it chooses a control action 
which provides little marginal gain when a higher gain action 
was available. Any cost-sensitive multi-class classifier may be 
used. 

During test time, for a given environment features are 
extracted, and the classifiers associated with each slot of the 
sequence outputs a control action to fill the slot. This sequence 
can then be evaluated as usual. This procedure is formalized in 
Algorithm [T] In computeTargetAct ions the previously 
detailed procedure for calculating the entries of the marginal 
loss matrix M^. for the slot is carried out, followed by the 
training step in train and classification step in classify. 

Algorithm [2] has a similar structure as algorithm [T] This 
alternate formulation has the advantage of being able to add 
actions to the control library without retraining the sequence 
of classifiers. Instead of directly identifying a target class, we 
use a squared-loss regressor in each slot to produce an estimate 
of the marginal benefit from each action at that particular slot. 
Hence Mbj is a |Z)| x |y | matrix of the actual marginal benefit 
computed in a similiar fashion as Mlj of Algorithm [l] and Mbj 
is the estimate given by our regressor at slot. In line [2] we 
compute the feature matrix Xi. In this case, a feature vector 



is computed per action per environment, and uses information 
from the previous slots' target choice Yg^.. For feature vectors 
of length L, Xj has dimensions |/)||^| x L. The features and 
marginal benefits at slot are used to train regressor 
producing the estimate Mb; . We then pick the action a which 
produces the maximum Mbj to be our target choice Yg^., a \D\ 
length vector of indices into Y for each environment. 

D. Reduction Argument 

We establish a formal regret reduction |3| between cost 
sensitive multi-class classification error and the resulting er- 
ror on the learned sequence of classifiers. Specifically, we 
demonstrate that if we consider the control actions to be the 
classes and train a series of classifiers- one for each slot of the 
sequence- on the features of a distribution of environments 
then we can produce a near-optimal sequence of classifiers. 
This sequence of classifiers can be invoked to approximate 
the greedy sequence constructed by allowing additive error in 
equation ([3]). 

Theorem 1. If each of the classifiers (7t\) trained in Al- 
gorithm [7] achieves multi-class cost-sensitive regret of ri, 
then the resulting sequence of classifiers is within at least 
(1 — ^)m3.xs^s^ f{S) —^4=1 of the optimal such sequence of 
classifiers S from the same hypothesis space. |^ 

Proof: (Sketch) Define the loss of a multi-class, 
cost-sensitive classifier 71 over a distribution of envi- 
ronments D as l{7i^D). Each example can be repre- 
sented as (x„ , , , , . . . , mi^' ) where Xn is the set 

"^When the objective is to minimize the time (depth in sequence) to find a 
satisficing element then the resulting sequence of classifiers <4JqI — 



of features representing the n!^ example environment and 
ml^ml^ml^ . . . ^m^f^^ are the per class costs of misclassifying 
Xfi. ml^ml^ml^. . .^ni^n^ are simply the n!^ row of M/^. (which 
corresponds to the nf^ environment in the dataset D). The best 
class has a misclassification cost and while others are greater 
than equal to (There might be multiple actions which will 
yield equal marginal benefit). Classifiers generally minimize 
the expected loss /(tt,/)) = E p7r(x„)] where 

(xn ,m2 , . . . ,mi^' ) ~Z) 

^n{xn) — ^n^^""^ denotes the example-dependent multi-class 
misclassification cost. The best classifier in the hypothesis 
space n minimizes l{7t^D) 



:argmin E [Cnixn)] 

^eTT (xn, ml, ml, ml,..., ml ')~Z) 



(2) 



The regret of n is defined as r = /(tt,/)) — /(;r*,Z)). Each 
classifier associated with slot of the sequence has a regret 

n- 

Streeter et al. 1211 consider the case where the decision 
made by the greedy algorithm is performed with additive error 
£i. Denote by ^ = i , ^2, • • • , a variant of the sequence S in 
which the argmax is evaluated with additive error Ef. This 
can be formalized as 



^Si)- f (Si) > max f {Si® Si) - 



-nsi)- 



(3) 



where = §i = {h^h^h^- • • ^'^/-i) for / > 1 and Si is the 
predicted control action by classifier TTj. They demonstrate that, 
for a budget or sequence length of N 



(4) 



assuming each control action takes equal time to execute. 

Thus the argmax in equation ^ is chosen with some 
error = An error made by classifier Tli corresponds to 
the classifier picking an action whose marginal gain is less 
than the maximum possible. Hence the performance bound on 
additive error greedy sequence construction stated in equation 
^ can be restated as 

/(V))>(l-^)ma^/(5)-f r,- (5) 



Theorem 2. The sequence of squared-loss regressors 



sea 

trained in AlgorithniUus within at least (1 — ^) max^^^ f{S) — 
Y!i=\ \/2(|y| — l)rre^ of the optimal sequence of classifiers S 
from the hypothesis space of multi-class cost-sensitive classi- 
fiers. 

Proof: (Sketch) Langford et al. |14| show that the re- 
gret reduction from multi-class classification to squared-loss 
regression has a regret reduction of ^/2{\k\ — l)rreg where k 
is the number of classes and r^^g is the squared-loss regret 
on the underlying regression problem. In Algorithm [2] we use 
squared-loss regression to perform multi-class classification 
thereby incurring for each slot of the sequence a reduction 



regret of ^Jl(\y\ — l)rregi where 1 1^ | is the number of actions 
in the control library. Theorem [T] states that the sequence of 
classifiers is within at least f{S(^N)) ^ (1 ~ l)^^^sey f{^) ~ 
Y4L1 n of the optimal sequence of classifiers. Plugging in 
the regret reduction from |14| we get the result that the 
resulting sequence of regressors in Algorithm [2| is within 
at least (1 - ^)max^^^/(^) - v^2(|r | - l^.g, of the 
optimal sequence of multi-class cost-sensitive classifiers. 



III. Case Studies 

A. Robot Manipulation Planning via Contextual Control Li- 
braries 

We demonstrate the application of ConSeqOpt for manip- 
ulation planning on a 7 degree of freedom manipulator. 

Recent work [TT| has shown that by relaxing the hard 
constraint of avoiding obstacles into a soft penalty term on 
collision, simple local optimization techniques can quickly 
lead to smooth, collision-free trajectories suitable for robot 
execution. Often the default initialization trajectory seed is 
a simple straight-line initialization in joint space ifTSll . This 
heuristic is surprisingly effective in many environments, but 
suffers from local convergence and may fail to find a trajectory 
when one exists. In practice, this may be tackled by providing 
cleverer initialization seeds fill EH- While these methods 
reduce the chance of falling into local minima, they do not 
have any alternative plans should the chosen initialization 
seed fail. A contextual ranking of a library of initialization 
trajectory seeds can provide feasible alternative seeds should 
earlier choices fail. Proposed initialization trajectory seeds can 
be developed in many ways including human demonstration 
ifTTl or use of a slow but complete planner|13 |. 

For this experiment we attempt to plan a trajectory to a 
pre-grasp pose over the target object in a cluttered envi- 
ronment using the local optimization planner CHOMP |18 j 
and minimize the total planning and execution time of the 
trajectory. A training dataset of \D\ =310 environments and 
test dataset of 212 environments are generated. Positions of the 
target object and obstacles on the table are randomly assigned. 
To populate the control library, we consider initialization 
trajectories that move first to an "exploration point" and 
then to the goal. The exploration points are generated by 
randomly perturbing the midpoint of the original straight line 
initialization in joint space. The resulting initial trajectories 
are then piecewise straight lines in joint space from the start 
point to the exploration point, and from the exploration point 
to the goal. Half of the seed trajectories are prepended with a 
short path to start from a left-arm configuration, and half are 
in right-arm configuration. This is because the local planner 
has a difficult time switching between configurations, while 
environmental context can provide a lot of information about 
which configuration to use. 30 trajectories are generated and 



form our action library Figure 2a shows an example set for 
a particular environment. Notice that in this case the straight- 
line initialization of CHOMP goes through the obstacle and 



therefore CHOMP has a difficult time finding a vahd trajectory 
using this initial seed. 

In our results we use a small number (1 — 3) of slots in our 
sequence to ensure the overhead of ordering and evaluating the 
library is small. When CHOMP fails to find a collision-free 
trajectory for multiple initializations seeds, one can always 
fall back on slow but complete planners. Thus the contextual 
control sequence's role is to quickly evaluate a few good 
options and choose the initialization trajectory that will result 
in the minimum execution time. We note that in our experi- 
ments, the overhead of ordering and evaluating the library is 
negligible as we rely on a fast predictor and features computed 
as part of the trajectory optimization, and by choosing a 
small sequence length we can effectively compute a motion 
plan with expected planning time under 0.5s. We can solve 
most manipulation problems that arise in our manipulation 
research very quickly, falling back to initializing the trajectory 
optimization with a complete motion planner only in the most 
difficult of circumstances. 

For each initialization trajectory, we calculate 17 simple 
feature values which populate a row of the feature matrix 
Xj! length of trajectory in joint space; length of trajectory in 
task space, the xyz values of the end effector position at the 
exploration point (3 values), the distance field values used 
by CHOMP at the quarter points of the trajectory (3 values), 
joint values of the first 4 joints at both the exploration point 
(4 values) and the target pose (4 values), and whether the 
initialization seed is in the same left/right kinematic arm con- 
figuration as the target pose. During training time, we evaluate 
each initialization seed in our library on all environments in 
the training set, and use their performance and features to train 
each regressor 3ii in ConSeqOpt. At test time, we simply 
run Algorithm |2] without the training step to produce Ig^^^...^^ 
as the sequence of initialization seeds to be evaluated. Note 
that while the first regressor uses only the 17 basic features, 
the subsequent regressors also include the difference in feature 
values between the remaining actions and the actions chosen 
by the previous regressors. These difference features improve 
the algorithm's ability to consider trajectory diversity in the 
actions chosen. 

We compare ConSeqOpt with two other methods of 
ranking the initialization library: a random ordering of the 
actions, and an ordering by sorting the output of the first 
regressor. Sorting by the first regressor is functionally the same 
as maximizing the absolute benefit rather than the marginal 
benefit at each slot. We compare both the number of CHOMP 
failures as well as the average execution time of the final 
trajectory. For execution time, we assume the robot can be 
actuated at 1 rad/second for each joint and use the shortest 
trajectory generated using the seeds ranked by ConSeqOpt 
as the performance. If we fail to find a collision free trajectory 
and need to fall back to a complete planner (RRT |[T3ll plus 
trajectory optimization), we apply a maximum execution time 
penalty of 40 seconds due to the longer computation time and 
resulting trajectory. 

The results over 212 test environments are summarized 




(a) The default straight-line initialization of CHOMP is marked in 
orange. Notice this initial seed goes straight through the obstacle and 
causes CHOMP to fail to find a collision-free trajectory. 




(b) The initialization seed for CHOMP found using ConSeqOpt is 
marked in orange. Using this initial seed CHOMP is able to find a 
collision free path that also has a relatively short execution time. 



Fig. 2: CHOMP initialization trajectories generated as control 
actions for ConSeqOpt. Blue lines trace the end effector path 
of each trajectory in the library. Orange lines in each image 
trace the initialization seed generated by the default straight- 
line approach and by ConSeqOpt, respectively. 



in Figure [3] With only simple straight line initialization, 
CHOMP is unable to find a collision free trajectory in 162/212 
environments, with a resulting average execution time of 33.4s. 
While a single regressor (N = 1) can reduce the number of 
CHOMP failures from 162 to 79 and the average execution 
time from 33.4s to 18.2s, when we extend the sequence 
length, ConSeqOpt is able to reduce both metrics faster 
than a ranking by sorting the output of the first regressor. 
This is because forN> 1, ConSeqOpt chooses a primitive 
that provides the maximum marginal benefit, which results 
in trajectory seeds that have very different features from the 
previous slots' choices. Ranking by the absolute benefit tends 
to pick trajectory seeds that are similar to each other, and 
thus are more likely to fail when the previous seeds fail. At a 
sequence length of 3, ConSeqOpt has only 16 failures and an 
average execution time of 3 seconds. A 90% improvement 
in success rate and a 75% reduction in execution time. 
Note that planning times are generally negligible compared to 
execution times for manipulation hence this improvement is 



Number of CHOMP Failures 



significant. Figure 2b shows the initialization seed found by 



ConSeqOpt for the same environment as in Figure 2a Notice 
that this seed avoids collision with the obstacle between the 
manipulator and the target object enabling CHOMP to produce 
a successful trajectory. 

B. Mobile Robot Navigation 

An effective means of path planning for mobile robots is 
to sample a budgeted number of trajectories from a large 
library of feasible trajectories and traverse the one which has 
the lowest cost of traversal for a small portion and repeat 
the process again. The sub-sequence of trajectories is usually 
computed offline l^*, 71. Such methods are widely used in 
modern, autonomous ground robots including the two highest 
placing teams for DARPA Urban Challenge and Grand Chal- 
lenge |[25||T5l[24l|23 1, LAGR 1 10], UPI 1 1 1, and Perceptor 1 12] 
programs. We use ConSeqOpt to maximize this function and 
generate trajectory sequences taking the current environment 
features. 



Figures |4a| and |4b] shows a section of Fort Hood, TX and 
the corresponding robot cost-map respectively. We simulated 
a robot traversing between various random starting and goal 
locations using the maximum-discrepancy trajectory |9| se- 
quence as well as sequences generated by ConSeqOpt using 
Algorithm [T] A texton library |26 | of 25 k-means cluster 
centers was computed for the whole overhead map. At run- 
time the texton histogram for the image patch around the robot 
was used as features. Online linear support vector machines 
(SVM) with slack re-scaling |[T9l were used as the cost- 
sensitive classifiers for each slot. We report a 9.6% decrease 
over 580 runs using N = 30 trajectories in the cost of traver- 
sal as compared to offline precomputed trajectory sequences 
which maximize the area between selected trajectories |[9|. Our 
approach is able to choose which trajectories to use at each 
step based on the appearance of terrain (woods, brush, roads. 
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Fig. 3: Results of ConSeqOpt for manipulation planning in 
212 test environments. The top image shows the number of 
CHOMP failures for three different methods after each slot 
in the sequence. ConSeqOpt not only significantly reduces 
the number of CHOMP failures in the first slot, but also 
further reduces the failure rate faster than both the other 
methods when the sequence length is increased. The same 
trend is observed in the bottom image, which shows the 
average execution time of the chosen trajectory. The 'No Seed' 
column refers to the straight-line heuristic used by the original 
CHOMP implementation 



etc.) As seen in Figure 4c at each time-step ConSeqOpt the 
trajectories are so selected that most of them fall in the empty 



space around obstacles. 

References 

[1] J.A. Bagnell, D. Bradley, D. Silver, B. Sofman, and 
A. Stentz. Learning for autonomous navigation. Robotics 
Automation Magazine, IEEE, 17(2):74 -84, June 2010. 

[2] D. Berenson, R. Diankov, K. Nishiwaki, S. Kagami, and 
J Kuffner. Grasp planning in complex scenes. In lEEE- 
RAS Humanoids, December 2007. 

[3] A. Beygelzimer, V. Dani, T. Hayes, J. Langford, and 




(a) Overhead color map of por- (b) Cost map of corresponding 
tion of Fort Hood, TX portion 




(c) Robot traversing the map using ConSeqOpt generating 
trajectory sequences which try to avoid obstacles in the vicinity 



B. Zadrozny. Error limiting reductions between classifi- 
cation tasks. In ICML. ACM, 2005. 

[4] Andreas Kraus Carlos Guestrin. Beyong convex- 
ity: Submodularity in machine learning. URL www. 
|submodularity.org 

[5] E. Chinellato, R.B. Fisher, A. Morales, and A.R del 
Pobil. Ranking planar grasp configurations for a three- 
finger hand. In ICRA, volume 1, pages 1133-1138. IEEE, 
2003. 

[6] Debadeepta Dey, Tian Yu Liu, Boris Sofman, and J. An- 
drew (Drew) Bagnell. Efficient optimization of control 
libraries. Technical Report CMU-RI-TR- 11-20, Robotics 
Institute, Pittsburgh, PA, June 2011. 

[7] L.H. Erickson and S.M. LaValle. Survivability: Measur- 
ing and ensuring path diversity. In ICRA, pages 2068- 
2073. IEEE, 2009. 

[8] E. Frazzoli, MA Dahleh, and E. Feron. Robust hybrid 
control for autonomous vehicle motion planning. In 
Decision and Control, 2000., volume 1, 2000. 

[9] C. Green and A. Kelly. Optimal sampling in the space 
of paths: Preliminary results. Technical Report CMU-RI- 
TR-06-51, Robotics Institute, Pittsburgh, PA, November 
2006. 

[10] LD Jackel et al. The DARPA LAGR program: Goals, 
challenges, methodology, and phase I results. JFR, 2006. 

[11] N. Jetchev and M. Toussaint. Trajectory prediction: 
learning to map situations to robot trajectories. In ICML. 
ACM, 2009. 

[12] Alonzo Kelly et al. Toward reliable off road autonomous 
vehicles operating in challenging environments. IJRR, 25 



(l):449-483. May 2006. 
[13] Jr. Kuffner, J.J. and S.M. LaValle. Rrt-connect: An 

efficient approach to single-query path planning. In 

ICRA, volume 2, pages 995 -1001, 2000. 
[14] John Langford and Alina Beygelzimer. Sensitive error 

correcting output codes. In Learning Theory, Lecture 

Notes in Computer Science. 2005. 
[15] M. Montemerlo et al. Junior: The Stanford entry in the 

urban challenge. JFR, 25(9):569-597, 2008. 
[16] Filip Radlinski, Robert Kleinberg, and Thorsten 

Joachims. Learning diverse rankings with multi-armed 

bandits. In Proceedings of the 25th ICML, 2008. 
[17] N. Ratliff, J. A. Bagnell, and S. Srinivasa. Imitation learn- 
ing for locomotion and manipulation. Technical Report 

CMU-RI-TR-07-45, Robotics Institute, Pittsburgh, PA, 

December 2007. 
[18] Nathan Ratliff, Matthew Zucker, J. Andrew (Drew) 

Bagnell, and Siddhartha Srinivasa. Chomp: Gradient 

optimization techniques for efficient motion planning. In 

ICRA, May 2009. 
[19] B. Scholkopf and A.J. Smola. Learning with kernels: 

support vector machines, regularization, optimization, 

and beyond, the MIT Press, 2002. 
[20] M. Stolle and C.G. Atkeson. Policies based on trajectory 

libraries. In ICRA, pages 3344-3349. IEEE, 2006. 
[21] M. Streeter and D. Golovin. An online algorithm for 

maximizing submodular functions. In NIPS, pages 1577- 

1584, 2008. 

[22] M. Streeter, D. Golovin, and A. Krause. Online learning 
of assignments. In NIPS, pages 1794-1802. Citeseer, 
2009. 

[23] Sebastian Thrun et al. Stanley: The robot that won the 
darpa grand challenge: Research articles. /. Robot. Syst., 
23:661-692, September 2006. 

[24] Christopher Urmson et al. A robust approach to high- 
speed navigation for unrehearsed desert terrain. JFR, 23 
(l):467-508, August 2006. 

[25] Christopher Urmson et al. Autonomous driving in urban 
environments: Boss and the urban challenge. JFR, June 
2008. 

[26] J. Winn, A. Criminisi, and T. Minka. Object catego- 
rization by learned universal visual dictionary. In ICCV, 
2005. 

[27] Matthew Zucker. A data-driven approach to high level 
planning. Technical Report CMU-RI-TR-09-42, Robotics 
Institute, Pittsburgh, PA, January 2009. 



